SOC2069
Researching
Social Life 1

Quantitative data and descriptive statistics

Dr. Chris Moreh

Outline

  1. Variables
  2. Descriptive statistics

Variables

What is a variable?

  • Statistical methods help us determine the factors that explain variability among subjects/respondents
  • For instance, variation occurs from student to student in their grades. What factors are responsible for that variability?
  • Any characteristic that we can measure for each subject is called a variable
  • Variable are characteristics that can vary in value among subjects in a sample or population
  • Examples of variables are income last year, number of children or siblings, whether employed, gender, how much one likes ice-cream on a scale of 1 to 10, etc.
  • The values the variable can take form the measurement scale
  • For gender, for instance, the measurement scale consists of the two (or more) labels, (female, male, other). For number of children/siblings, it would be (0, 1, 2, 3, 4, …)

Measurement scales

  • A variable is called quantitative when the measurement scale has numerical values that represent different magnitudes of the variable
  • A variable is called categorical when the measurement scale is a set of categories
  • For categorical variables, distinct categories differ in quality, not in numerical magnitude. For this reason, categorical variables are often called qualitative (but we won’t call them as such, to avoid confusion with the type of qualitative data we covered in the first half of the module)

Measurement scales

Measurement scales

The position of ordinal scales on the quantitative–qualitative classification is fuzzy. Because their scale is a set of categories, they are often analyzed using the same methods as nominal scales. But in many respects, ordinal scales more closely resemble interval scales. They possess an important quantitative feature: each level has a greater or smaller magnitude than another level

Measurement scales

A variable’s values are discrete if its possible values form a set of separate numbers, such as (0, 1, 2, 3, . . . ).

They are continuous if it can take an infinite continuum of possible real number values.

Measurement scales

Where do variables come from?

Descriptive statistics

Describing categorical variables

  • Categorical data are characterized by a frequency distribution
  • A frequency table is a listing of possible values for a variable, together with the number of observations (n) at each value
  • When the table shows the proportions or percentages instead of the numbers, it is called a -relative- frequency distribution
  • Frequency distributions can also be visualised with a bar graph

Describing categorical variables

  • Categorical data are characterized by a frequency distribution
  • A frequency table is a listing of possible values for a variable, together with the number of observations (n) at each value
  • When the table shows the proportions or percentages instead of the numbers, it is called a -relative- frequency distribution
  • Frequency distributions can also be visualised with a bar graph

Describing numeric variables

Quantitative variables can be summarised by measures of central tendency and variation (spread)


Central tendency

Describing numeric variables

Quantitative variables can be summarised by measures of central tendency and variation (spread)


Central tendency

Describing numeric variables

Quantitative variables can be summarised by measures of central tendency and variation (spread)


Central tendency

Describing numeric variables

Quantitative variables can be summarised by measures of central tendency and variation (spread)


Central tendency

The mode also applies to categorical variables - it’s more useful for describing the category with the highest frequency

Describing numeric variables

Quantitative variables can be summarised by measures of central tendency and variation (spread)


Variation (spread)

Describing numeric variables

Quantitative variables can be summarised by measures of central tendency and variation (spread)


Variation (spread)

Describing numeric variables

Quantitative variables can be summarised by measures of central tendency and variation (spread)


Variation (spread)

Describing numeric variables

Quantitative variables can be summarised by measures of central tendency and variation (spread)


Variation (spread)

Describing numeric variables

Quantitative variables can be summarised by measures of central tendency and variation (spread)


Variation (spread)

Describing numeric variables

Quantitative variables can be visualised with a histogram (a special frequency distribution with grouped numeric values)

Describing numeric variables

Quantitative variables can be visualised with a histogram (a special frequency distribution with grouped numeric values)


The normal distribution

The normal distribution

The normal distribution

The normal distribution

Skewed distribution

Quartiles and outliers

Box plots are useful as they provide a visual summary of the data enabling researchers to quickly identify central values, the dispersion of the data set, and signs of skewness.

  • Minimum

    The lowest score, excluding outliers (shown at the end of the left whisker).

  • Lower Quartile

    Twenty-five percent of scores fall below the lower quartile value (also known as the first quartile).

  • Median

    The median marks the mid-point of the data and is shown by the line that divides the box into two parts (sometimes known as the second quartile). Half the scores are greater than or equal to this value and half are less.

  • Upper Quartile

    Seventy-five percent of the scores fall below the upper quartile value (also known as the third quartile). Thus, 25% of data are above this value.

  • Maximum

    The highest score, excluding outliers (shown at the end of the right whisker).

  • Whiskers

    The upper and lower whiskers represent scores outside the middle 50% (i.e. the lower 25% of scores and the upper 25% of scores).

  • The Interquartile Range (or IQR)

    This is the box plot showing the middle 50% of scores (i.e., the range between the 25th and 75th percentile).

Boxplot: